Improved Data Partitioning for Building Large ROLAP Data Cubes in Parallel
نویسندگان
چکیده
The pre-computation of data cubes is critical to improving the response time of On-Line Analytical Processing (OLAP) systems and can be instrumental in accelerating data mining tasks in large data warehouses. However, as the size of data warehouses grows, the time it takes to perform this pre-computation becomes a significant performance bottleneck. This paper presents an improved parallel method for generating ROLAP data cubes on a shared-nothing multiprocessor based on a novel optimized data partitioning technique. Since no shared disk is required, our method can be used for highly scalable processor clusters consisting of standard PCs with local disks only, connected via a data switch. The approach taken, which uses a ROLAP representation of the data cube, is well suited for large data warehouses and high dimensional data, and supports the generation of both fully materialized and partially materialized data cubes. We have implemented our new parallel shared-nothing data cube generation method and evaluated the impact of our novel optimized data partitioning technique. The experiements show a significant performace improvement. As a result, our new optimized parallel data cube generation method achieves close to optimal speedup for as many as 32 processors, generating a full data cube for a fact table with 16 million rows and 8 attributes in under 7 minutes. For a fact table with 256 million rows and 8 attributes, our improved method reaches optimal speedup for 32 processors, generating a full data cube consisting of ≈ 7 billion rows (200 Gigabytes) in under 88 minutes. In comparison with previous approaches, our new method does significantly improve the scalability with respect to both, the number of processors and the I/O bandwidth (number of parallel disks). Keywor ds: Data Cube, ROLAP, Parallel Computing.
منابع مشابه
Building Large ROLAP Data Cubes in Parallel1
The pre-computation of data cubes is critical to improving the response time of On-Line Analytical Processing (OLAP) systems and can be instrumental in accelerating datamining tasks in large data warehouses. However, as the size of data warehouses grows, the time it takes to perform this pre-computation becomes a significant performance bottleneck. This paper presents a fast parallel method for...
متن کاملParallel Multi-Dimensional ROLAP Indexing
This paper addresses the query performance issue for Relational OLAP (ROLAP) datacubes. We present a distributed multi-dimensional ROLAP indexing scheme which is practical to implement, requires only a small communication volume, and is fully adapted to distributed disks. Our solution is efficient for spatial searches in high dimensions and scalable in terms of data sizes, dimensions, and numbe...
متن کاملParallel Multi-Dimensional RolaP Indexing1
This article addresses the query performance issue for Relational OLAP (ROLAP) datacubes. We present RCUBE, a distributed multidimensional ROLAP indexing scheme which is practical to implement, requires only a small communication volume, and is fully adapted to distributed disks. Our solution is efficient for spatial searches in high dimensions and scalable in terms of data sizes, dimensions, a...
متن کاملRCUBE: Parallel Multi-Dimensional ROLAP Indexing
This paper addresses the query performance issue for Relational OLAP (ROLAP) datacubes. We present RCUBE, a distributed multi-dimensional ROLAP indexing scheme which is practical to implement, requires only a small communication volume, and is fully adapted to distributed disks. Our solution is efficient for spatial searches in high dimensions and scalable in terms of data sizes, dimensions, an...
متن کاملLossless Reduction of Datacubes using Partitions
Datacubes are specially useful for answering efficiently queries on data warehouses. Nevertheless the amount of generated aggregated data is huge with respect to the initial data which is itself very large. Recent research has addressed the issue of a summary of Datacubes in order to reduce their size. The approach presented in this paper fits in a similar trend. We propose a concise representa...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- IJDWM
دوره 2 شماره
صفحات -
تاریخ انتشار 2006